188 research outputs found

    A tale of two citations

    Get PDF

    Systematic Characterizations of Text Similarity in Full Text Biomedical Publications

    Get PDF
    Computational methods have been used to find duplicate biomedical publications in MEDLINE. Full text articles are becoming increasingly available, yet the similarities among them have not been systematically studied. Here, we quantitatively investigated the full text similarity of biomedical publications in PubMed Central.72,011 full text articles from PubMed Central (PMC) were parsed to generate three different datasets: full texts, sections, and paragraphs. Text similarity comparisons were performed on these datasets using the text similarity algorithm eTBLAST. We measured the frequency of similar text pairs and compared it among different datasets. We found that high abstract similarity can be used to predict high full text similarity with a specificity of 20.1% (95% CI [17.3%, 23.1%]) and sensitivity of 99.999%. Abstract similarity and full text similarity have a moderate correlation (Pearson correlation coefficient: -0.423) when the similarity ratio is above 0.4. Among pairs of articles in PMC, method sections are found to be the most repetitive (frequency of similar pairs, methods: 0.029, introduction: 0.0076, results: 0.0043). In contrast, among a set of manually verified duplicate articles, results are the most repetitive sections (frequency of similar pairs, results: 0.94, methods: 0.89, introduction: 0.82). Repetition of introduction and methods sections is more likely to be committed by the same authors (odds of a highly similar pair having at least one shared author, introduction: 2.31, methods: 1.83, results: 1.03). There is also significantly more similarity in pairs of review articles than in pairs containing one review and one nonreview paper (frequency of similar pairs: 0.0167 and 0.0023, respectively).While quantifying abstract similarity is an effective approach for finding duplicate citations, a comprehensive full text analysis is necessary to uncover all potential duplicate citations in the scientific literature and is helpful when establishing ethical guidelines for scientific publications

    eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications

    Get PDF
    Authors, editors and reviewers alike use the biomedical literature to identify appropriate journals in which to publish, potential reviewers for papers or grants, and collaborators (or competitors) with similar interests. Traditionally, this process has either relied upon personal expertise and knowledge or upon a somewhat unsystematic and laborious process of manually searching through the literature for trends. To help with these tasks, we report three utilities that parse and summarize the results of an abstract similarity search to find appropriate journals for publication, authors with expertise in a given field, and documents similar to a submitted query. The utilities are based upon a program, eTBLAST, designed to identify similar documents within literature databases such as (but not limited to) MEDLINE. These services are freely accessible through the Internet at http://invention.swmed.edu/etblast/etblast.shtml, where users can upload a file or paste text such as an abstract into the browser interface

    Modeling DNA beacons at the mesoscopic scale

    Full text link
    We report model calculations on DNA single strands which describe the equilibrium dynamics and kinetics of hairpin formation and melting. Modeling is at the level of single bases. Strand rigidity is described in terms of simple polymer models; alternative calculations performed using the freely rotating chain and the discrete Kratky-Porod models are reported. Stem formation is modeled according to the Peyrard-Bishop-Dauxois Hamiltonian. The kinetics of opening and closing is described in terms of a diffusion-controlled motion in an effective free energy landscape. Melting profiles, dependence of melting temperature on loop length, and kinetic time scales are in semiquantitative agreement with experimental data obtained from fluorescent DNA beacons forming poly(T) loops. Variation in strand rigidity is not sufficient to account for the large activation enthalpy of closing and the strong loop length dependence observed in hairpins forming poly(A) loops. Implications for modeling single strands of DNA or RNA are discussed.Comment: 15 pages, 17 figures, submitted to Eur. J. Phys.

    Disappearance of Azoxystrobin and difenoconazole in green beans cultivated in Souss Massa valley (Morocco)

    Get PDF
    A study was undertaken to evaluate the degradation behavior and residue levels of azoxystrobin and difenoconazole in Belma green beans variety grown in an experimental plastic greenhouse. The measurements were made over a 3 week period in which up to two successive treatments with azoxystrobin and a 5 week period in which up to two successive treatments with difenoconazole were carried out. Residue levels of dicofol and difenoconazole were determined by Gas chromatography with electron capture detection (GC-ECD), liquid-liquid extraction (LLE), solid phase extraction (SPE) and high-performance liquid chromatography (HPLC). During the study, residue levels in the plantation ranged between 0.35 and 0.01 mg/kg for azoxystrobin and between 0.25 and 0.01 mg/kg for difenoconazole. The residual concentrations after the preharvest intervals (PHI) were below the legal limits

    Identifying duplicate content using statistically improbable phrases

    Get PDF
    Motivation: Document similarity metrics such as PubMed's ā€˜Find related articlesā€™ feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of abstracts, which constitute only a small fraction of a publication's total text. Extending searches to include text archived by online search engines would drastically increase comparison ability. For large-scale studies, submitting short phrases encased in direct quotes to search engines for exact matches would be optimal for both individual queries and programmatic interfaces. We have derived a method of analyzing statistically improbable phrases (SIPs) for assistance in identifying duplicate content

    DĆ©jĆ  vu: a database of highly similar citations in the scientific literature

    Get PDF
    In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available DĆ©jĆ  vu, a publicly available database of highly similar Medline citations identified by the text similarity search engine eTBLAST. Following manual verification, highly similar citation pairs are classified into various categories ranging from duplicates with different authors to sanctioned duplicates. DĆ©jĆ  vu records also contain user-provided commentary and supporting information to substantiate each document's categorization. DĆ©jĆ  vu and eTBLAST are available to authors, editors, reviewers, ethicists and sociologists to study, intercept, annotate and deter questionable publication practices. These tools are part of a sustained effort to enhance the quality of Medline as ā€˜theā€™ biomedical corpus. The DĆ©jĆ  vu database is freely accessible at http://spore.swmed.edu/dejavu. The tool eTBLAST is also freely available at http://etblast.org
    • ā€¦
    corecore